-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: New string conversion API #951
base: start-8
Are you sure you want to change the base?
Conversation
Replaces a lot of pointers with `std::span`, adds `source_location`. Keeping the new API backwards-compatible by checking at compile time which API each of a traits struct's conversion functions implement.
These versions of the functions live _outside_ the traits types. They help hide the changes in the string conversion API in libpqxx 7.
I'm working to improve memory safety and verifiability, and part of that is reducing the use of raw pointers. As it stands, the new conversion API still has one pointer in it — Question is: should the function return a view on the string it wrote into the buffer? Or should it return the span of remaining buffer space, keeping it convenient to append another value? It's not a completely trivial question because there's a terminating zero between the two. Returning the pointer is elegant in that it's the only piece of information that the caller wants and doesn't already have. I could also just return an index instead of a pointer, but I feel that would just complicate subsetting operations, and invite subtle bugs. I may simply have to try each of the alternatives and compare them. |
Actually, now that I've looked at the uses... returning an index into the buffer does look like the solution that best fits the callers' needs! And that means it's a simple solution, which in turn probably makes it less error-prone. |
Probably lots more changes... It was a hard job.
This goes further than merely testing the values these functions return. It also checks for buffer overruns, memory overwrites, terminating zero in the right place, and so on.
The new string conversion API will include encoding information. We need that for array/composite parsing. I was thinking to hide this away in a class, but... it's starting to sound like too much overhead, both for the hardware and for humans. We've had this "encoding groups" system for years now, and haven't seen much need to change it. There are _some_ changes: I just added an `UNKNOWN` value to the enum so we can deal at least somewhat gracefully with cases where we don't have the information. And I'm looking forward in the 8.0 release to retiring a few of the encoding groups. Perhaps there are ones we'll want to add as well, but adding is cheap. Compilers and tooling will warn about unhandled cases in a `switch`, and that gives me more confidence that people won't be surprised by unexpected new groups. At least, nobody who cares about keeping their code correct, enables compiler warnings, and pays attention.
This changes the string conversion API, but also introduces a mechanism for maintaining backward compatibility for existing string conversions, so as to minimise the trouble for users who implement their own.
It works like this: string conversion for a type is defined by its specialisation of
pqxx::string_traits
. This type contains a few conversion functions:to_buf()
,into_buf()
,from_string()
. The signatures for these functions are changing, but in ways that we can reasonably "translate" both ways.So these new generic functions outside of the traits types hide the difference. They expose only the new-style API, but they each can call either an old-style implementation or a new-style implementation (preferring the new-style one, of course). The use of function concepts for this avoids niggling incompatibilities due to optional extra parameters, or slight differences in types. If the call works, the API matches.
The new API reduces the use of raw pointers, in favour of
std::span<char>
buffers. It also introduces an optionalstd::source_location
argument. And, in the near future, I hope to add some encapsulation of client encoding information so that we can parse some things that aren't safe to parse right now.